[MXNET-862] Basic maven jenkins pipeline #13450

zachgk · 2018-11-29T03:44:33Z

Description

This is a ~~WIP~~ PR ~~to share progress~~ on the maven nightly jenkins pipeline. The goal is to create a Jenkins for continuous deployment that will build scala jars, test them on multiple different environments, and then deploy them to the apache snapshot repository. It can also provide some setup for building out additional continuous deployment efforts for other platforms such as pip.

~~Note that this PR depends on #13046 and is rebased on top of it. Those changes are displayed as part of the first commit in the PR.~~

@nswamy @lanking520 @andrewfayres @piyushghai @marcoabreu @yzhliu

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Build pipeline for build, test, deploy

zachgk · 2018-11-29T03:45:27Z

@mxnet-label-bot add [Java, Scala, CI, Maven]

marcoabreu · 2018-11-29T18:19:54Z

Lets wait until after the release before we move forward with this PR as discussed on dev@ :)

piyushghai · 2018-12-04T17:55:01Z

ci/Jenkinsfile_utils.groovy

+        if(clean == 1)
+          clean = 'git clean -xdf'
+        else if(clean == 2)
+          clean = 'git clean -xdff'


Is there a difference between the clean command in clean == 1 and clean == 2 ?

The clean -ff will remove code inside the .git directory which messes with the deploy process. See https://git-scm.com/docs/git-clean

Please leave this as it is.

The workspace always has to be considered empty and reproducible. If you need to persist something in between stages, use the stashing mechanism. With this change you are circumventing a problem - that you are not persisting state properly - and it will blow up as soon as we have more than one restricted slave and job one gets scheduled onto a different slave than job two

This change is not to persist something in between stages. During maven deploy, it attempts to move all code into a temp directory, rebuild, and deploy from that temp directory. The rebuilding calls "git submodule update" which breaks if the .git directory has already been cleaned with "-ff" since the submodule configuration is deleted. I keep the default as "clean -xdff" so that keeping the .git directory must be explicitly chosen and does not affect any other code.

I think I'd like to have a deeper look into that, so far I'm not following

The maven release plugin messes with the project SCM (git). It first tries to create a new tag for the release and a new commit updating the version number located in the POM file. Then, when performing the release, it will checkout the code to a temporary directory and perform the release (building, testing, and deploying) from that new clean directory. This will break if the .git directory has been pruned. You can see docs for the maven release plugin at http://maven.apache.org/maven-release/maven-release-plugin/examples/prepare-release.html

piyushghai · 2018-12-04T17:58:46Z

scala-package/examples/pom.xml

@@ -119,6 +119,14 @@
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-jar-plugin</artifactId>
+        <executions>


Would the child poms not inherit this execution stage from the parent pom file ?

I don't think the maven-jar-plugin is in the parent pom file because we don't want all of the modules to inherit it

piyushghai · 2018-12-04T18:06:05Z

scala-package/packageTest/README.md

@@ -0,0 +1,50 @@
+# MXNet Scala Package Test


This README is super helpful!
Thanks @zachgk 💯

piyushghai · 2018-12-04T18:06:29Z

scala-package/packageTest/Makefile

+
+SCALA_VERSION_PROFILE := 2.11
+SCALA_VERSION := 2.11.8
+MXNET_VERSION := 1.3.1-SNAPSHOT


Should this version remain hardcoded here ?

I will investigate using [1.3.1-SNAPSHOT,) to see if it works. Otherwise, I don't know if there are other options.

piyushghai · 2018-12-04T18:07:59Z

Makefile

 		    -Dbuild.platform="$(SCALA_PKG_PROFILE)" \
 			-Dcflags="$(CFLAGS)" -Dldflags="$(LDFLAGS)" \
 			-Dlddeps="$(LIB_DEP) $(ROOTDIR)/lib/libmxnet.a")

+scaladeploylocal:
+	(cd $(ROOTDIR)/scala-package; \
+		mvn deploy $(MAVEN_ARGS) -Papache-release,$(SCALA_PKG_PROFILE),$(SCALA_VERSION_PROFILE) \-DskipTests=true -Dcxx="$(CXX)" \


Is there a reason why you are skipping the tests here on the deploy stage ?

We are skipping those tests because we run the tests separately with scalaunittest/scalaintegrationtest before calling deploy and there is no reason to repeat tests

piyushghai · 2018-12-04T18:11:54Z

Makefile

@@ -454,6 +454,10 @@ else
 	CFLAGS += -DMXNET_USE_LIBJPEG_TURBO=0
 endif

+ifeq ($(CI), 1)
+	MAVEN_ARGS := -B


Curious to know what this -B flag does :)

The -B is batch mode. The main change is that it removes the downloading progress messages that are written to the terminal/logs. They take up a lot of space and make the logs hard to read.

Progress

Roshrini · 2018-12-19T19:33:53Z

@zachgk Can you rebase this PR?
@mxnet-label-bot Update [pr-work-in-progress]

zachgk · 2018-12-20T02:01:52Z

I am currently thinking that since these are relatively large commits, I would split them into three different PRs that will create respectively a basic pipeline, a pipeline with multi-environment testing, and the final pipeline integrating the static linking code. After rebasing, I reduced the amount of code as part of this PR (it has some of commit 2), so it should be roughly equal to the code from 20 days ago. This also no longer has code from the packageTest utility because that was merged into master.

@mxnet-label-bot update [CI, Java, Maven, Scala, pr-awaiting-review]

lanking520

Thanks for your contribution, please indicate the change you would like to make for your second PR

zachgk · 2018-12-20T20:15:27Z

As part of the later PRs, the changes to the current files will include:

Add new Docker containers specific to publishing and a clean dependency-less environment for testing so the updates to ubuntu_scala will be moved
The scala deployment will be moved from scala-package/dev to ci/publish/scala
It will use the static building script instead of the scala-package/dev/compile-mxnet-backend.sh script for building the backend

Note that this PR forms a working pipeline that will build, test, and then deploy the package for ubuntu 16.04

Please review @nswamy @andrewfayres @piyushghai @marcoabreu

piyushghai · 2018-12-21T18:37:26Z

ci/publish/Jenkinsfile

+utils.assign_node_labels(utility: 'restricted-utility', linux_cpu: 'restricted-mxnetlinux-cpu', linux_gpu: 'restricted-mxnetlinux-gpu', linux_gpu_p3: 'restricted-mxnetlinux-gpu-p3', windows_cpu: 'restricted-mxnetwindows-cpu', windows_gpu: 'restricted-mxnetwindows-gpu')
+
+def nodeMap = ['cpu': NODE_LINUX_CPU, 'gpu': NODE_LINUX_GPU]
+def scalaOSMap = ['cpu': 'linux-x86_64-cpu', 'gpu': 'linux-x86_64-gpu']


Can you add a comment here as to why we are not doing MacOS through Jenkins ?

marcoabreu · 2018-12-23T21:39:18Z

ci/Jenkinsfile_utils.groovy

+          cleanCmd = 'git clean -xdf'
+        else if(clean == 2)
+          cleanCmd = 'git clean -xdff'
+        sh cleanCmd


Could you elaborate why this is needed? This "super brutal clean" as I like to call it, was requested by @lebeg and he said it's important. Maybe you two could have a discussion there :)

Actually, I can remove this because we are no longer going to use the maven-release-plugin following updates as part of #13626

marcoabreu · 2018-12-23T21:40:26Z

ci/publish/Jenkinsfile

+def nodeMap = ['cpu': NODE_LINUX_CPU, 'gpu': NODE_LINUX_GPU]
+def scalaOSMap = ['cpu': 'linux-x86_64-cpu', 'gpu': 'linux-x86_64-gpu']
+
+def wrapStep(nodeToRun, workspaceName, step) {


I love this generator model. Great job!

marcoabreu · 2018-12-23T21:41:50Z

ci/publish/Jenkinsfile

+  return {
+    node(nodeToRun) {
+      ws("workspace/${workspaceName}") {
+        env.MAVEN_OPTS = "-Dorg.slf4j.simpleLogger.log.org.apache.maven.cli.transfer.Slf4jMavenTransferListener=warn" // Clean up maven logs


Would it be possible to move this logic into the runtime functions? We generally try to keep Jenkinsfile limited to process flow design while logic is in the runtime functions. Otherwise, you can't reproduce the individual steps using the docker wrapper because certain variables (like this env var) are set outside of the docker process.

Makes sense. I'll move it

marcoabreu · 2018-12-23T21:43:42Z

scala-package/dev/buildkey.py

+            logging.exception("The request was invalid due to:")
+        elif client_error.response['Error']['Code'] == 'InvalidParameterException':
+            logging.exception("The request had invalid params:")
+        else:


I know this is the default template of AWS SecretsManager, but could we raise always (instead of the else case) to halt the execution? I have made the experience that letting the process continue to then fail later obfuscated the error message and confuses some people.

In this particular case, the function would return None while the caller expects a tuple. This then throws a confusion message which makes it harder to debug. wdyt?

Yeah, this sounds like a good improvement. I'll apply it in docker_cache as well for the sake of consistency

marcoabreu · 2018-12-23T21:45:10Z

scala-package/dev/buildkey.py

+def importASC(key, gpgPassphrase):
+    filename = os.path.join(KEY_PATH, "key.asc")
+    with open(filename, 'w') as f:
+        f.write(key)


Are we making sure that this file is deleted when the process finishes? Or is it local to the container?

Nice catch, same applies to the hidden settings.xml in .m2 folder

It is put in the ~ directory, which I think that makes it local to the container. Either way, I will add some rm statements for extra security.

marcoabreu · 2018-12-23T21:47:26Z

scala-package/dev/buildkey.py

+        f.write(key)
+    subprocess.run(['gpg2', '--batch', '--yes',
+                    '--passphrase=\"{}\"'.format(gpgPassphrase),
+                    "--import", "{}".format(filename)])


Ha! Pro tip - I currently have to write a COE because I made the same mistake - don't do that or you will leak credentials :P

Subprocess.run will print the command (in this case including the password) if the execution fails. This then writes the password to the log and you will leak credentials. Is it possible to pass in the password using STDIN or some other redirection method?

I can link you the ticket privately if you're interested in further details.

@marcoabreu Currently, we use fake gpg key in here as it is not necessary to have the real one in the deploy stage. Problems will appear in the release stage as I would not do it with CI. But still it is very useful information, please keep us updated if you find a better solution to this.

@lanking520 We still use this to encrypt the passwords through maven, so this is certainly a problem we need to address.

I worked around this using a combination of passing data through stdin and passing data using the linux expect program (because maven doesn't work with stdin)

marcoabreu · 2018-12-23T21:47:57Z

scala-package/dev/buildkey.py

+                .format(password))
+
+
+def severPSW(username, password, gpgPassphrase):


what does "sever" mean? 🤔

The Apache server

@lanking520 "server" was misspelled

lanking520 · 2019-01-03T17:41:28Z

@marcoabreu Could you please review again?

piyushghai

Thanks for these changes. Looks good to me :)

lanking520

Still looks good. Tested locally with the build steps and all working!

marcoabreu

Minor nits, but overall looks great :)

marcoabreu · 2019-01-07T11:37:11Z

scala-package/dev/build.sh

+
+# Scala steps to deploy
+make scalapkg CI=1
+make scalaunittest CI=1


The filename lets me assume the task of this script is to build, but it seems like a lot of testing is happening as well.

Generally, we try to split compilation and testing first to make it cleaner, and second to use the most appropriate instance type. Since you also made a test script, could we move these tests over there to keep the compilation jobs as lightweight as possible?

marcoabreu · 2019-01-07T11:39:49Z

scala-package/dev/deploy.sh

+
+if [[ $MAVEN_PUBLISH_OS_TYPE == "linux-x86_64-cpu" ]];
+then
+    MAKE_FLAGS="USE_BLAS=openblas USE_DIST_KVSTORE=1 ENABLE_TESTCOVERAGE=1"


Do we want to compile with test coverage annotations for production builds?

marcoabreu · 2019-01-07T11:40:01Z

scala-package/dev/deploy.sh

+    MAKE_FLAGS="USE_BLAS=openblas USE_DIST_KVSTORE=1 ENABLE_TESTCOVERAGE=1"
+elif [[ $MAVEN_PUBLISH_OS_TYPE == "linux-x86_64-gpu" ]]
+then
+    MAKE_FLAGS="USE_OPENCV=1 USE_BLAS=openblas USE_CUDA=1 USE_CUDA_PATH=/usr/local/cuda USE_CUDNN=1 SCALA_ON_GPU=1 SCALA_TEST_ON_GPU=1 USE_DIST_KVSTORE=1 ENABLE_TESTCOVERAGE=1"


* Jenkins Publish Nightly Maven Progress * Seperate Build, Test, and Deploy Stages with parallel

zachgk requested review from marcoabreu, nswamy, szha and yzhliu as code owners November 29, 2018 03:44

marcoabreu added CI Java Label to identify Java API component Maven Scala labels Nov 29, 2018

piyushghai reviewed Dec 4, 2018

View reviewed changes

zachgk force-pushed the mavenPublishing branch 2 times, most recently from 4a1a078 to 88cd89e Compare December 8, 2018 00:28

Jenkins Publish Nightly Maven

91a4495

Progress

marcoabreu added pr-work-in-progress PR is still work in progress and removed CI Java Label to identify Java API component Maven Scala labels Dec 19, 2018

zachgk force-pushed the mavenPublishing branch from 88cd89e to eaf3229 Compare December 20, 2018 01:34

zachgk changed the title ~~[WIP][MXNET-862] Basic maven jenkins pipeline~~ [MXNET-862] Basic maven jenkins pipeline Dec 20, 2018

marcoabreu added CI and removed pr-work-in-progress PR is still work in progress labels Dec 20, 2018

marcoabreu added Java Label to identify Java API component Maven pr-awaiting-review PR is waiting for code review Scala labels Dec 20, 2018

zachgk force-pushed the mavenPublishing branch 2 times, most recently from c03e2e2 to f317589 Compare December 20, 2018 19:30

lanking520 approved these changes Dec 20, 2018

View reviewed changes

zachgk mentioned this pull request Dec 21, 2018

[MXNET-1188] Maven Publishing Test Environments #13707

Merged

5 tasks

piyushghai reviewed Dec 21, 2018

View reviewed changes

zachgk force-pushed the mavenPublishing branch from f317589 to 0c19e1f Compare December 21, 2018 18:56

marcoabreu suggested changes Dec 23, 2018

View reviewed changes

zachgk force-pushed the mavenPublishing branch from 0c19e1f to 80e69fc Compare January 2, 2019 23:46

piyushghai approved these changes Jan 3, 2019

View reviewed changes

zachgk mentioned this pull request Jan 3, 2019

[MXNET-1191] Maven Jenkins Pipeline with Static Binary and GPU #13767

Merged

5 tasks

lanking520 approved these changes Jan 4, 2019

View reviewed changes

marcoabreu suggested changes Jan 7, 2019

View reviewed changes

Seperate Build, Test, and Deploy Stages with parallel

80f5388

zachgk force-pushed the mavenPublishing branch from 80e69fc to 80f5388 Compare January 7, 2019 21:43

marcoabreu approved these changes Jan 8, 2019

View reviewed changes

lanking520 approved these changes Jan 8, 2019

View reviewed changes

lanking520 merged commit 96439e6 into apache:master Jan 8, 2019

piyushghai pushed a commit to piyushghai/incubator-mxnet that referenced this pull request Feb 27, 2019

[MXNET-862] Basic maven jenkins pipeline (apache#13450)

ebb3df3

* Jenkins Publish Nightly Maven Progress * Seperate Build, Test, and Deploy Stages with parallel

zachgk deleted the mavenPublishing branch April 12, 2019 22:26

lanking520 pushed a commit to lanking520/incubator-mxnet that referenced this pull request Apr 26, 2019

[MXNET-862] Basic maven jenkins pipeline (apache#13450)

634a8bd

* Jenkins Publish Nightly Maven Progress * Seperate Build, Test, and Deploy Stages with parallel

lanking520 pushed a commit to lanking520/incubator-mxnet that referenced this pull request Apr 26, 2019

[MXNET-862] Basic maven jenkins pipeline (apache#13450)

15cb272

* Jenkins Publish Nightly Maven Progress * Seperate Build, Test, and Deploy Stages with parallel

lanking520 pushed a commit to lanking520/incubator-mxnet that referenced this pull request Apr 30, 2019

[MXNET-862] Basic maven jenkins pipeline (apache#13450)

4d0d607

* Jenkins Publish Nightly Maven Progress * Seperate Build, Test, and Deploy Stages with parallel

zachgk added a commit to zachgk/incubator-mxnet that referenced this pull request May 16, 2019

[MXNET-862] Basic maven jenkins pipeline (apache#13450)

e288301

* Jenkins Publish Nightly Maven Progress * Seperate Build, Test, and Deploy Stages with parallel

haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019

[MXNET-862] Basic maven jenkins pipeline (apache#13450)

04f1490

* Jenkins Publish Nightly Maven Progress * Seperate Build, Test, and Deploy Stages with parallel

		.format(password))


		def severPSW(username, password, gpgPassphrase):

[MXNET-862] Basic maven jenkins pipeline #13450

[MXNET-862] Basic maven jenkins pipeline #13450

Conversation

zachgk commented Nov 29, 2018 • edited Loading

Description

Checklist

Essentials

Changes

zachgk commented Nov 29, 2018

marcoabreu commented Nov 29, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Roshrini commented Dec 19, 2018

zachgk commented Dec 20, 2018

lanking520 left a comment

Choose a reason for hiding this comment

zachgk commented Dec 20, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lanking520 commented Jan 3, 2019

piyushghai left a comment

Choose a reason for hiding this comment

lanking520 left a comment

Choose a reason for hiding this comment

marcoabreu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zachgk commented Nov 29, 2018 •

edited

Loading

zachgk commented Dec 20, 2018 •

edited

Loading